Abstract

Telematics data provides information about vehicle location, speed, pickup and drop off locations, acceleration, braking, time of trip and more. Given that the resources on telematics data is quite limited, the first part of this capstone project aims at understanding and visualizing such data. Building off of this part, we then pair telematics data with census data to identify patterns and contextualize the results, in this case looking at medium income in Ohio in the covid-19 era. With the spread of covid-19 in 2020 and the Ohio governor issuing a stay at home mandate, people had to transition to working at home. This resulted in a lot of people losing their jobs, especially low income families. As for telematics data, we wanted to see how the pandemic impacted rideshare usage. We layered American Community Survey (ACS) median income level data with pickup locations to see trends. We found that the number of trips in 2021 increased significantly, with lower income communities not using rideshare apps.
Keywords: Telematics Data, Census data, Visualizations, Ride-share apps

Introduction and Motivation

The problem this capstone project aims to address is two-fold. The first part is to document how to use and visualize telematics data and the second part is how to pair telematics data with census data in order to find relevant patterns. During our first meeting, the ask from our sponsor, 99P Labs, was to analyze and visualize a dataset of telematics data they provided us with. The telematics dataset was collected from a rideshare service and contained tables on pickup and dropoff locations and times, vehicle routes, safety metrics, types of customers, and more. The initial goal of this project was to find patterns in the data and create a story to interpret these patterns efficiently. Therefore, we were tasked with each doing our own individual exploratory data analysis to come up with a diverse range of potential patterns. However, we quickly realized that even before getting to the analysis portion of the project, we needed to understand what the telematics data given means. The data dictionary provided was not useful in elaborating on what the tables and variables of the telematics data encompass. External sources were not as helpful either to help us load and visualize the data. This brings us to the first problem we set out to solve which is documenting what is telematics data and how to visualize telematics data.

The second part of this project was to meet our sponsors’ request of creating a story with the telematics data. Given that the time frame of the telematics data is 2020-2023, we thought it would be interesting to see how Covid-19 affected the distribution of pick-up locations. Moreover, we decided to pair the telematics data with census data to identify and investigate any trends that pop up. We decided to use census data on median income to first calculate the percentage change in median household income from 2020 to 2021 and then overlay the pickup locations from 2020 and 2021 to pinpoint any relative patterns. Such a story can help in establishing whether there is a possible correlation between median household income and rideshare trips.

On a broader scale, telematics data can help identify the potential ways in which not only vehicles can operate more efficiently, but also transport systems. Reducing traffic congestion as well as fuel consumption are just a few examples of putting telematics data to good use. Moreover, when looking at the current landscape when it comes to the automotive industry, the green energy transition and AI seem to be at the forefront of innovation. Using telematics data can further these innovations to create better sustainable transportation systems. It can also act as a primary source of data to inform, structure, and build new AI models to make traveling safer and faster. Bottom line is that as with any other data sources, telematics data can be used to identify areas of improvement if analyzed properly.

Literature Review

In 2017, Uber launched the Movement website that shared the data Uber gathered for 100,000 cities across the world. The website aimed for promoting better urban planning by offering data-driven insights into commuting. It offered specific information about travels including travel conditions across different times of day, days of the week, or months of the year—and how travel times are impacted by big events, road closures or other things happening in a city. As a result, the Uber Movement benefited urban planners in better evaluating which parts of the city need to be expanded and better managing infrastructure. Given that it was open to the public, the website was also helpful for commuters to plan their trips and better respond to emergencies in transportation systems. However, as of October 1, 2023, the website is no longer available for unknown reasons.

Past research has looked into the impact of pandemic on shared mobility systems (Menon et al. 2020) . It was found that several shared mobility systems have been negatively impacted by the pandemic as they’re perceived by some as “unsafe” due to challenges with social distancing, including buses and taxis. While the crisis has led to a 50 to 90 percent decline in transit ridership in major metropolitan areas based on reports from transportation apps, it was anticipated that low-income households would likely switch to public transportation during the pandemic restrictions.

There is a lack of documentation of the telematics dataset we were given, such that the existing data dictionary did not fully clarify what certain variables mean and how they relate to each other across datasets. It necessitates our effort in figuring out what the data exactly means and how to best visualize them (e.g., visualizing real-time location data as the trajectory of a trip).

While past research, such as the Uber Movement, has presented very detailed and comprehensive pictures of transportation and traveling, they did not focus on making connections with other aspects of people’s lives. It is important, however, to understand the data within a broader socioeconomic context. We believe that household income is an important predictor for trip making and activity engagement, as also supported by past studies that predicted that low-income households will likely switch to public transportation more than private transportation during the pandemic (Taylor and Wasserman 2020).

Therefore, we aim to contextualize rideshare trip information by looking at trip patterns in relation to change in median income during the pandemic at county level. In order to visualize the patterns, we plan to overlay census data on the trip request telematics data, specifically the pick-up locations.

Our Contributions

In summary, our project contributes to the existing literature and resources in two main ways, each relating to the first and second parts of this project respectively. The first contribution this paper makes is acting as a resource which people can use to learn about telematics data and how to load, visualize, and interpret this data all in one document. We have documented the helpful tips and information that helped us navigate such data so that others can easily avoid running into the same problems that we encountered. The second contribution that our paper makes is giving insights into the potential relationship between percentage change in median household income and rideshare trip patterns from 2020-2021. While such changes might be attributed to the Covid-19 pandemic, it is still crucial to understand how both household median income and rideshare trips distribution changed relative to the pandemic and to each other as well.

Methodology

Data

Our dataset was provided by 99P Labs. For our project, we focused on two specific tables out of the eight made available: TRIP_REQUEST_202308141518.csv and VEHICLE_LOCATION_202308141525.csv. The VEHICLE_LOCATION was our largest dataset, around 150 MB.

The TRIP_REQUEST table comprised 17 variables, with our analysis concentrating on two key variables: PICKUP_ADDRESS and DROPOFF_ADDRESS. Utilizing the tidygeocoder library, we transformed these addresses into coordinates. In the context of Telematics Data 101, we subsequently mapped the pickup and drop-off locations using these coordinates.

Turning to the VEHICLE_LOCATION table, it featured a total of 5 variables. For our project, we specifically utilized VEHICLE_ID, EVENT_TIMESTAMP, LAT, and LNG, deeming SPEED_MPH unnecessary. Within the Telematics Data 101 framework, we mapped the route of a single vehicle on October 3rd, 2021. This decision was motivated by the sheer size of the dataset, making it impractical to map more than one vehicle effectively.”

Methods

For part one of our project, the main tasks were loading in the telematics data into RStudio, geocoding the addresses, and creating data visualizations of the data. While loading our data into R, we faced multiple difficulties. Our dataset included a total of 8 tables. Some were small, while others were too large. The largest table containing information about trip requests was around 150 MB. This data set took around 30 minutes to load into our local computers. We attempted to upload our data into our GitHub repository, but it turned out we cannot upload any files larger than 25 MB. To resolve this issue, we put these files into our gitignore and moved our directory to the RStudio server which allowed us to load in larger datasets more efficiently than our local computers. Next, we geocoded the pickup and drop off addresses in the trip request dataset into their respective latitudes and longitudes. This would allow us to use the mapping package Leaflet to create interactive data visualizations.

In our code, we first used method = osm. Using this, it took us around two hours to geocode all of the addresses in the dataset. In addition, our laptops had to be open and connected to wifi. Otherwise, the code fails and we would have to restart it. Once we changed our method to census and we started using the rstudio2 server, our code ran in less than 10 minutes and we had our addresses geocoded. Additionally, using the ‘census’ method allowed us to geocode the locations without acquiring a specific API key, which other methods required. We saved all the latitude and longitude addresses into a datafile (trip_data_full), which could then be visualized using the leaflet interactive mapping package. When attempting to map all the pickup or drop off locations using leaflet, the code would crash due to the amount of locations. Therefore, we used the clustering functionality in Leaflet, which clusters the markers and shows the number of items in each cluster and as one zooms in, the clusters are adjusted based on the current view. The visualizations of the pickup requests can be seen in Figure 1. Along with working with the trip request dataset, we were also interested in the vehicle location dataset. As seen in Figure 2, we created a visualization of the routes taken on a particular day for one particular vehicle, which required filtering out a specific date and vehicle id. Because the location of the vehicle is reported as a data point throughout the entire ride, we were able to plot these data points to display the entire route of a vehicle. Finally, we used the census data to visualize the median income per county and used color palettes to show the difference in income levels.

For part two of our project, we were interested in exploring the relationship between areas of concentration of pickup locations and the changes in median household income. To answer these questions, we paired the American Community Survey (ACS) data as well as the trip request table from the telematics data 99P Labs provided to us. ACS is a project of the U.S. Census Bureau that gathers data annually regarding information about American population and housing characteristics. To help answer our questions, we focused on the median household income in each county in Ohio in 2020 and 2021. We mapped the percent difference in median income between 2020 and 2021 and used color grading to define the magnitude and direction of the percent differences. We then paired the ACS Data with the telematics data, specifically the pickup locations in 2020 and 2021. Clustering the locations in the telematics data highlighted the areas of concentrated rideshare requests. By pairing the ACS and telematics data, we hoped to find relationships between the concentrations of pickup locations and changes in median household income.

Results

Part One

For the first part of our project, we came up with a document titled, Telematics Data 101, where we gave an overview of what telematics data is, how to load telematics data and the challenges that come with doing so, how to geocode telematics data, how to download census data so that it can potentially be paired with telematics data, and last but not least, how to visualize telematics data. In the Introduction and Motivation section as well as Methods section of this paper, we have discussed what is telematics data and why is it important to study such data, and how to go about facing the challenges in loading and geocoding large datasets of telematics data. All of this work laid the foundations so that we could visualize telematics data in different ways.

A few examples of the data visualizations we came up with are shown and briefly explained below:

Pick Up Requests

Figure 1: Pick Up Requests

By using a clustering mechanism, Figure 1 shows the different clusters of pick-up locations across the United States according to the telematics data of the rideshare company provided to us. By zooming in on specific clusters, more precise pick-up locations can be observed.

Vehicle Route

## [1] "#808080" "#808080" "#808080" "#808080" "#808080"

Figure 2: Vehicle route on a particular day

Figure 2 showcases part of the route taken by a particular vehicle on a particular day around the Columbus area in the state of Ohio. This route could include picking up or dropping off passengers as well as any stops made along the way. This figure gives an overview of the overall route that particular vehicle took.

Median income

Figure 3: Median income across the state of Ohio in 2020

Given that to contextualize telematics data, often secondary sources of data need to be used to be able to discern a pattern, Figure 3 focuses on visualizing census data as a secondary source of data. It shows the median income across the different counties in the state of Ohio in the year 2020.

Part Two

For the second part of this project, we focused on identifying patterns between percentage change in median income and trends in pickup locations in Ohio from 2020 - 2021.

Figure 4: Layered map with 2020 pickup locations

Figure 5: Layered map with 2021 pickup locations

When coming to analyze our data, we notice a few things. If we look at Figure 4, we see that Ohio had a very small number of rideshare trips that year, 41 to be exact. If we were to put this into context, we come to the realization that 2020 was when Covid started to break out- malls, theaters, and restaurants were closed. Students were doing online Zoom classes and people were even working from home. Nationwide lockdowns shut everything down. People stayed in their homes, they had no reason to get out, they couldn’t even go anywhere.

When we look at the second visualization, with the 2021 pickup locations, we notice that Ohio had 16,728 reported rideshare trips done. This is nearly a 408x increase in trips compared to 2020. Again, when we think about what was going on during this time, we realize that 2021 was when the country slowly began to “open” again. School resumed, people went back to work, and life was slowly restoring back to the way it was. We were finding what it meant to be “normal” again.

One similar trend we see is that most of these pickup locations were around Columbus, the state’s capital. So the fact that this was the area with the most populated trips, did not surprise us. In fact, in 2020, the pickup locations were only around Columbus. But in 2021, the trips were more widely spread out. We had pickup locations throughout Columbus, Cincinnati, Chillicothe, Newark, Toledo, Elyria. Essentially, the pickup locations were split between Columbus and northern Ohio. The year 2021 showing that we had a lot more trips in more spread out places reinforces the idea that the state was becoming more lively.

Now, let’s compare the percent differences in median incomes. We see that there were significant differences in percent income differences, also around the Columbus area. Whether it was a positive, or negative difference, a lot of cities around there were hit. One county specifically was Clinton County, which had a -35.8% median income difference between the two years. The negative difference suggests that the median income in this county was essentially higher in 2020 than 2021. Another noticeable difference was shown in Fayette County (where the city of Washington Court House is) which had a -24.4% median income difference. While those counties had the most noticeable negative median income differences, most of the visualization shows us that most counties did follow this trend of having negative median differences outcomes. While these counties might be represented by a very light shade of red, meaning the difference was not so significant or high, a negative difference still persists, meaning most counties in Ohio did have higher median incomes in 2020 rather than 2021. One reason for this could possibly be because of the hit in the economy, 2021 was still recovering from that.

Cities that had positive percent differences were Union County (59.9), Greene County (43), Pickaway County (30.8), Fairfield County (25.3), and Miami County (20.9). These were the only 5 counties out of all 88 in Ohio that had higher median incomes in 2021 than 2020. Why were these counties not as affected? A possible explanation could be that during 2020, their businesses were not significantly affected as others were, which did not drastically bring down everyone’s median income.

When looking at figure 4, we notice that none of the trips happened in the red counties, meaning that in low income counties, not many people might use private transportation like Uber or Lyft, instead they might use public transportation. In both years, these private rideshare pickup locations happened in counties with a positive percent difference. Even if the difference is significantly large, this shows us that our findings are consistent with previous findings that people from lower income communities are less likely to use private transportation (Taylor and Wasserman 2020)

Conclusion

The main limitation was that because the data dictionary for the dataset provided was not very comprehensive, our team had to make assumptions when interpreting the different variables and the unit of observations. Even when filtering the data for only Ohio datapoints, some coordinates outside of these parameters managed to slip through. When visualizing the pickup and drop off locations specifically in Ohio, some of the data points were shown in Washington. Our group was not sure whether there was an issue with the geocoding process or if there was a mistake in the addresses in the dataset. We suspect it has something to do with how the data was collected and stored rather than the code used to filter the tables.

For future potential steps, if another research group is considering furthering our work, they can look at more or all states in the U.S. We chose Ohio to simplify things and because the Honda headquarters are in Ohio. But, if people are interested in how the trends look in other states, they can feel free to look at that. Additionally, another interesting factor to look at would be post covid data. Unfortunately for us, we had no post covid census data at the time of starting the project. Since we used data from 2020 and 2021 (beginning and during covid), it would be interesting to see how these trends look like in a post covid world. Our project was also limited in the sense where we only looked at median income levels. The census data has a humongous amount of information. Crime rates, poverty rates, transportation, and many more. Our project was aimed at comparing median income rates in the years 2021 and 2020, while also looking at the different pickup locations using public transportation rideshare.

Bibliography

Menon, Nikhil, Yaye Keita, Robert L Bertini, et al. 2020. “Impact of COVID-19 on Travel Behavior and Shared Mobility Systems.”
Taylor, B, and J Wasserman. 2020. “For the Press: Transportation, Coronavirus and COVID-19. Institute of Transportation Studies.”

Appendix

The Many Adventures of GollumThe Many Adventures of GollumThe Many Adventures of GollumThe Many Adventures of GollumThe Many Adventures of GollumThe Many Adventures of GollumThe Many Adventures of GollumThe Many Adventures of GollumThe Many Adventures of Gollum

The Many Adventures of Gollum